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Abstract 

We  develop  a  Markov  decision  process  (MDP)  model  to  examine  military  medical 
evacuation  (MEDEVAC)  dispatch  policies.  To  solve  the  MDP,  we  apply  an  ap¬ 
proximate  dynamic  programming  (ADP)  technique.  The  problem  of  deciding  which 
aeromedical  asset  to  dispatch  to  which  service  request  is  complicated  by  the  service 
locations  and  the  priority  class  of  each  casualty  event.  We  assume  requests  for  MEDE¬ 
VAC  arrive  sequentially,  with  the  location  and  the  priority  of  each  casualty  known 
upon  initiation  of  the  request.  The  proposed  model  hnds  a  high  quality  dispatching 
policy  which  outperforms  the  traditional  myopic  policy  of  sending  the  nearest  avail¬ 
able  unit.  Utility  is  gained  by  servicing  casualties  based  on  both  their  priority  and 
the  actual  time  until  a  casualty  arrives  at  a  medical  treatment  facility  (MTE).  The 
model  is  solved  using  approximate  policy  iteration  (API)  and  least  squares  temporal 
difference  (LSTD).  Computational  examples  are  used  to  investigate  dispatch  policies 
for  a  scenario  set  in  northern  Syria.  Results  indicate  that  a  myopic  policy  is  not 
always  the  best  policy  to  use  for  quickly  dispatching  MEDEVAC  units,  and  insight 
is  gained  into  the  value  of  specihc  MEDEVAC  locations. 

Key  words:  Emergency  Medical  Dispatch,  medical  evacuation  (MEDEVAC),  Markov 
decision  processes,  approximate  dynamic  programming,  approximate  policy  iteration, 
least  squares  temporal  difference 
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AN  APPROXIMATE  DYNAMIC  PROGRAMMING  MODEL 
FOR  MEDEVAC  DISPATCHING 

I.  Introduction 

The  United  States  military  uses  two  classification  categories  when  evacuating 
injured  soldiers  and  civilians  from  the  battleheld.  These  are  medical  evacuation 
(MEDEVAC)  and  casualty  evacuation  (CASEVAC).  The  primary  and  preferred  method 
is  MEDEVAC,  which  constitutes  dedicated  medical  personnel  on  board  the  vehicle 
that  is  transporting  casualties.  The  second  method,  often  used  as  a  contingency,  is 
CASEVAC,  in  which  there  are  no  dedicated  medical  personnel  on  board  to  attend  to 
a  casualty  event  while  in  transit  to  a  medical  treatment  facility  (MTF)  (Department 
of  the  Army,  2007).  Any  type  of  vehicle  may  be  used  to  conduct  MEDEVAC  and 
CASEVAC  operations. 

When  a  request  for  a  MEDEVAC  occurs,  there  are  three  categories  of  evacuation 
precedence  (Department  of  the  Army,  2007): 

Priority  I,  Urgent.  Assigned  to  emergency  cases  that  should  be  evacuated  as  soon 
as  possible  and  within  a  maximum  of  1  hour  in  order  to  save  life,  limb,  or  eyesight, 
to  prevent  complications  of  serious  illness,  or  to  avoid  permanent  disability. 

Priority  II,  Priority.  Assigned  to  sick  and  wounded  personnel  requiring  prompt 
medical  care.  This  precedence  is  used  when  the  individual  should  be  evacuated  within 
4  hours  or  when  an  individual’s  medical  condition  could  deteriorate  to  such  a  degree 
that  he  or  she  will  become  an  URGENT  precedence,  or  whose  requirements  for  special 
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treatment  are  not  available  locally,  or  who  will  suffer  unnecessary  pain  or  disability. 


Priority  III,  Routine:  Assigned  to  sick  and  wounded  personnel  requiring  evacu¬ 
ation  but  whose  condition  is  not  expected  to  deteriorate  signihcantly.  The  sick  and 
wounded  in  this  category  should  be  evacuated  within  24  hours. 

Aerial  MEDEVAC  operations  involve  the  use  of  dedicated  helicopters,  specihcally 
the  UH-60  A/L  Blackhawk.  Helicopters  are  uniquely  suited  for  MEDEVAC  oper¬ 
ations  because  they  are  able  to  travel  faster,  further,  and  access  terrain  which  is 
not  accessible  to  ground  vehicles.  The  ability  to  simultaneously  treat  and  quickly 
transport  casualties  from  the  point  of  injury  (POI)  to  an  MTF  greatly  increases  the 
chance  of  survival  for  casualties.  During  the  Korean  War,  the  United  States  military 
experienced  the  first  large-scale  use  of  helicopters  to  remove  casualties  from  the  bat- 
tleheld.  Currently,  survivability  of  injuries  on  the  battleheld  are  at  a  historic  high; 
90%  of  all  casualties  survive,  compared  to  84%  in  Vietnam  and  80%  in  World  War 
II  (Eastridge  et  al.,  2012).  This  improvement  is  attributed  primarily  to  the  speed  in 
which  casualties  are  able  to  receive  proper  medical  attention.  A  quote  from  United 
States  Army  Surgeon,  Major  General  Neel  Spurgeon: 

“Getting  the  casualty  and  the  physician  together  as  soon  as  possible  is 
the  keystone  of  the  practice  of  combat  medicine... (Spurgeon,  1991)” 

Recent  conflicts  have  exhibited  a  shift  from  traditional  force-on-force  engage¬ 
ments  to  counterinsurgency  operations  (COIN).  With  COIN,  units  are  typically  much 
smaller  and  more  geographically  dispersed,  causing  a  greater  dispersion  of  critical  re¬ 
sources.  As  the  area  of  the  battleheld  increases,  helicopters  provide  a  lifeline  to  these 
soldiers,  allowing  them  to  operate  further  away  from  bases  while  still  able  to  re¬ 
ceive  aerial  support.  From  the  inception  of  Operation  Enduring  Freedom  (OEF)  in 
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2001  until  2008,  more  than  3200  casualties  have  been  transported  using  MEDEVAC 
(Hartenstein,  2008). 

When  utilizing  MEDEVAC  assets,  three  different  aspects  need  to  be  considered: 
location,  dispatching,  and  redeployment.  The  location  of  MEDEVAC  assets  is  a 
balance  between  maximizing  coverage  and  minimizing  response  time.  Placement  may 
be  farther  constrained  by  force  protection  and  maintenance  necessities.  Dispatching 
of  MEDEVAC  units  is  often  conducted  using  a  myopic  policy  in  which  the  nearest 
MEDEVAC  is  lannched  to  a  POI  regardless  of  its  priority.  The  third  aspect,  dynamic 
redeployment  of  ambulances,  is  alse  possible.  However,  commnnication  and  crew 
limitations  often  make  this  problematic  and  is  not  typically  performed. 

In  this  thesis  we  consider  the  MEDEVAC  dispatching  problem  in  which  a  dis¬ 
patching  anthority  mnst  decide  which  MEDEVAC  to  send  in  response  to  a  request 
for  MEDEVAC.  Redeployment  is  not  considered.  A  Markov  decision  process  (MDP) 
is  contracted  to  model  this  MEDEVAC  dispatching  problem.  We  ntilize  an  Approxi¬ 
mate  Dynamic  Programming  (ADP)  approach  to  obtain  high  qnality  solutions  to  the 
problem.  The  proposed  ADP  algorithm  utilizes  Least  Squares  Temporal  Difference 
(LSTD)  policy  evaluation  within  an  Approximate  Policy  Iteration  (API)  framework. 
Bellman  error  minimization  is  applied  in  the  policy  improvement  phase  to  obtain  im¬ 
proved  policies.  To  demonstrate  the  applicability  of  the  model  to  the  medical  planning 
process,  we  present  a  notional  scenario  involving  the  allied  defense  of  northern  Syria 
in  response  to  aggression  by  Islamic  State  (IS)  militants. 

This  thesis  is  organized  as  follows.  In  Chapter  2  we  review  related  research  in  the 
contemporary  literature.  In  Chapter  3  we  present  the  MDP  formulation  of  the  prob¬ 
lem  and  ADP  algorithm  to  solve  it.  Chapter  4  contains  the  computational  scenario 
and  testing  resnlts.  In  Chapter  5  we  present  the  Endings  and  conclusions. 
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II.  Literature  Review 


Our  literature  review  focuses  on  two  research  areas.  First,  we  examine  research  in 
emergency  medical  systems  (EMS)  and  MEDEVAC  in  order  to  inform  the  model  of 
the  MEDEVAC  dispatching  problem.  Next,  research  is  focused  on  the  held  of  ADP 
to  inform  the  development  of  the  solution  methodology. 

2.1  EMS  and  MEDEVAC 

The  nature  of  MEDEVAC  operations  shares  many  inherent  similarities  with  EMS. 
Decisions  need  to  be  made  quickly  regarding  which  unit  will  serve  a  specihc  casualty 
event.  We  initially  examine  research  into  EMS  optimization,  which  can  be  traced 
back  to  the  late  1960’s  and  early  1970’s  with  papers  on  optimally  locating  EMS 
units.  Church  &  ReVelle  (1974)  examine  the  maximal  coverage  location  problem 
(MCLP),  ensuring  there  is  an  ambulance  within  a  specihc  distance  or  time  from  a 
POL 

ReVelle  &  Hogan  (1989)  introduce  a  MCLP  which  builds  upon  the  basic  maxi¬ 
mal  covering  problem  by  ensuring  there  is  always  an  ambulance  available  within  a 
predetermined  length  of  time  in  the  event  of  another  ambulance  being  unavailable. 
Alsalloum  &  Rand  (2006)  extend  the  approach  used  by  ReVelle  &  Hogan  (1989);  they 
determine  the  minimum  number  of  vehicles  required  to  cover  the  largest  possible  area 
given  a  set  of  constraints. 

Batta  et  al.  (1989)  extend  the  MCLP  by  examining  not  only  busy  probabilities 
for  EMS  units,  but  also  the  queuing  of  calls  via  the  Hypercube  model.  Silva  &  Serra 
(2008)  expand  upon  covering  problems  by  incorporating  queuing  theory  as  well  as 
establishing  diherent  patient  priority  levels. 

Although  locating  assets  is  a  critical  aspect  of  EMS  and  MEDEVAC  systems. 
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the  limitations  for  MEDEVAC  placement  are  beyond  the  scope  of  this  thesis.  We 
examine  how  to  optimize  dispatching.  Initial  research  by  Carter  et  al.  (1972)  found 
that  dispatching  the  nearest  unit  to  a  service  did  not  always  produce  the  lowest 
average  response  time.  This  result  is  very  important,  as  most  EMS  and  MEDEVAC 
systems  operate  using  a  myopic  policy  of  sending  the  nearest  unit  to  any  patient. 

More  recent  research  by  McLay  &  Mayorga  (2010)  utilize  real-world  EMS  data 
to  examine  optimal  dispatching  policies.  In  their  model,  McLay  &  Mayorga  consider 
fixed  bases  for  responding  units  and  measured  response  time  thresholds  (RTT)  as  a 
measure  for  patient  survivability.  McLay  &  Mayorga  obtain  policies  that  decrease 
RTT  for  high  priority  patients  compared  to  a  myopic  policy. 

Bandara  et  al.  (2012)  build  upon  the  work  of  McLay  &  Mayorga  (2010)  using 
patient  survivability  instead  of  RTT.  This  measure  more  closely  mirrors  patient  out¬ 
comes,  provide  more  insight  into  the  signihcance  of  any  improvement.  Mayorga  et  al. 
(2013)  continue  to  add  depth  to  dispatching  research  by  designing  a  constructive 
heuristic  to  identify  response  districts  for  EMS  units.  They  compare  system  per¬ 
formance  between  respective  policies  of  allowing  units  to  service  other  districts  and 
forcing  them  to  stay  in  their  own  district.  These  results  are  compared  against  myopic 
and  heuristic  policies,  and  it  is  shown  that  all  three  policies  perform  better  than  the 
myopic  policy. 

In  order  to  optimize  the  MEDEVAC  system,  an  objective  function  must  be  deter¬ 
mined.  Erkut  et  al.  (2008)  question  the  terms  “coverage”  and  “performance”  for  an 
EMS  system.  They  propose  the  use  of  a  monotonically  decreasing  function  over  time 
for  the  probability  of  patient  survivability  as  a  measure  of  performance.  Subsequent 
work  by  Bandara  et  al.  (2012)  and  Grannan  et  al.  (2014)  similarly  use  survivability 
functions  instead  of  the  traditional  RTT  as  a  performance  measure  for  their  model. 
A  problem  with  the  survivability  function  is  the  difficulty  Ending  empirical  evidence 
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to  support  a  particular  functional  form.  Eastridge  et  al.  (2012)  provide  extensive 
statistics  about  combat  deaths,  but  the  response  times  are  not  known,  preventing 
the  creation  of  an  appropriate  survivability  function.  Although  Feero  et  al.  (1995) 
examine  EMS  response  times  relative  to  trauma  survivability,  their  work  is  limited 
to  response  times  under  eight  minutes.  MEDEVAC  units  typically  need  to  travel 
signihcantly  further  than  EMS  units,  and  so  response  times  are  signihcantly  longer. 
The  goal  of  U.S.  MEDEVAC  support  is  to  respond  within  60  minutes  from  notihca- 
tion  to  drop-off  of  the  patient  at  a  MTF  (Garrett,  2013).  The  one-hour  response  is 
sometimes  referred  to  as  the  “Golden  Hour”.  Lerner  &  Moscati  (2001)  examine  the 
“Golden  Hour,”  which  is  mentioned  in  numerous  trauma  articles.  However,  its  exact 
origins  and  any  quantihable  measure  are  not  reported.  We  assume  an  exponentially 
decreasing  function  to  model  patient  outcome. 

Research  specihcally  into  military  MEDEVAC  systems  has  been  conducted  re¬ 
cently  by  Zeto  et  al.  (2006),  Fulton  et  al.  (2010),  Bastian  (2010),  Grannan  et  al. 
(2014)  and  Keneally  et  al.  (2014).  Much  of  the  research  in  military  MEDEVAC  con¬ 
cerns  optimal  emplacement  of  assets.  Zeto  et  al.  use  a  maxi-min  goal  programming 
approach  adapted  from  Alsalloum  &  Rand  (2006)  in  order  to  maximize  coverage  and 
minimize  the  response  times  for  MEDEVAC  units  in  the  Afghanistan  theater.  Fulton 
et  al.  (2010)  develop  a  stochastic  model  to  determine  where  to  emplace  multiple  med¬ 
ical  assets  such  as  air  and  ground  MEDEVAC  units.  Bastian  (2010)  develops  a  model 
focusing  on  the  performance  measure  of  the  UH-60  A/L  combined  with  austere  and 
hostile  conditions  in  which  it  operates.  Using  a  combination  of  goal  programming 
and  stochastic  optimization,  Bastian  (2010)  seeks  to  optimally  emplace  MEDEVAC 
assets  in  Afghanistan. 

Keneally  et  al.  (2014)  develop  an  MDP  to  examine  MEDEVAC  dispatch  policy  in 
regional  command  south  (RC-S)  of  the  OEF  theater.  They  use  two  priority  classih- 
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cations,  a  reward  function  based  on  RTT,  and  Monte  Carlo  simulation  with  a  Hawkes 
process  for  casualty  generation.  The  hnd  that  the  myopic  policies  do  not  always  lead 
to  optimal  performance. 

2.2  ADP 

We  also  consider  an  MDP  model.  However,  the  curse  of  dimensionality  prevents 
us  from  applying  the  MDP  solution  techniques  used  by  Keneally  et  al.  (2014).  We 
turn  our  attention  to  ADP  which  has  emerged  as  a  technique  to  solve  large  or  complex 
problems  for  making  sequential  decisions  under  uncertainty.  Powell  (2012)  provides 
a  broad  overview  of  ADP  and  its  origins  from  different  research  communities.  He 
found  many  communities  use  similar  methods  or  algorithms  to  battle  the  curse  of 
dimensionality  that  many  systems  or  problems  face.  The  three  curses  of  dimension¬ 
ality  that  affect  systems  concern  the  state  space,  the  outcome  space,  and  the  action 
space.  Powell  presents  four  classes  of  policies,  which  refer  to  the  mapping  of  a  state  to 
an  action,  as  being  myopic  cost  function  approximations,  look-ahead  policies,  policy 
function  approximations,  and  value  function  approximations.  ADP  approaches  also 
incorporate  hybrid  policies,  i.e.  combining  two  or  more  classes.  Researchers  seeking 
such  policies  are  able  to  employ  ADP  techniques  to  determine  such  policies  and  solve 
otherwise  intractable  MDPs.  We  use  value  function  approximation  when  solving  the 
dispatching  problem. 

Maxwell  et  al.  (2010)  present  an  ADP  approach  to  examine  the  potential  of  dy¬ 
namically  redeploying  ambulances  to  maximize  the  number  of  patients  that  are  served 
within  an  RTT.  They  set  the  framework  of  their  system  using  an  MDP  with  a  similar 
state  space  to  the  MEDEVAC  problem.  At  each  event  occurrence  in  their  system, 
such  as  a  new  call  arrival  or  an  ambulance  bringing  a  patient  to  the  hospital,  a  decision 
is  made  to  redeploy  any  available  units  to  better  cover  the  service  area.  They  obtain 
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a  redeployment  policy  for  this  this  high  dimensional  system  using  approximate  value 
iteration.  They  are  able  to  demonstrate  a  signihcant  improvement  in  EMS  response 
at  two  metropolitan  cities  using  dynamic  redeployment  versus  a  static  myopic  policy. 

Although  there  is  much  potential  in  dynamic  routing  and  positioning  of  MEDE- 
VAC  units,  the  complexity  of  helicopter  operations  limits  the  ease  and  practicality  of 
dynamic  routing.  This  thesis  assumes  all  MEDEVAC  units  return  to  their  base  upon 
service  completion. 

Bradtke  &  Barto  (1996)  introduce  three  temporal  difference  algorithms:  normal¬ 
ized  temporal  difference  (NTD),  recursive  least  squares  temporal  difference  (RLSTD), 
and  LSTD.  Temporal  difference  learning  allows  the  system  to  learn  the  expected  value 
for  a  state-action  pair.  They  prove  all  three  converge  to  optimality  when  used  with 
a  functional  approximator.  LSTD  and  RLSTD  are  able  to  extract  more  information 
from  each  observation  which  in  turn  allows  each  algorithm  to  converge  to  optimality 
faster  than  NTD.  LSTD  and  RLSTD  do  require  more  computational  for  time  each 
observation;  however,  this  is  offset  by  its  faster  convergence  rate. 

Lagoudakis  &  Parr  (2003)  introduce  a  least  squares  policy  iteration  (LSPI)  al¬ 
gorithm  which  builds  upon  LSTD.  Policy  iteration  (PI),  hrst  introduced  by  Howard 
(1960),  is  a  simple  two-step  iterative  algorithm  for  stationary  policies.  First,  the  pol¬ 
icy  is  evaluated  to  determine  its  value.  Second,  we  attempt  to  improve  the  policy  by 
Ending  a  policy  variant  which  is  monotonically  increasing  in  value.  API  introduces 
approximations  to  represent  the  value  function  and  policy  in  order  to  make  these 
problems  computationally  tractable.  Lagoudakis  &  Parr  (2003)  compare  the  perfor¬ 
mance  of  LSPI  to  other  reinforcement  learning  algorithms.  They  find  LSPI  performs 
significantly  better  than  Q-learning  and  is  computationally  faster. 


III.  Methodology 


3.1  MDP  Formulation 


When  a  casualty  occurs  and  a  MEDEVAC  request  is  received,  a  decision  must  be 
made  quickly  regarding  which  MEDEVAC  asset  to  dispatch.  Any  delays  in  decision 
making  affect  casualties’  survivability.  Thus,  it  is  critical  to  quickly  and  accurately 
determine  a  high  quality  solution.  The  stochastic  elements  in  the  model  are  depicted 
in  Figure  1. 


Casualty  Event  Assign  MEDEVAC  MEDEVAC  Arrives  MEDEVAC  MEDEVAC  Arrives  MEDEVAC  MEDEVAC  Returns 


Occurs 

I 

I 

I 

Jl. 


MEDEVAC  Departs 

I  I 

I  I 

I  I 


Jl 


Jl 


at  POI 

I 

I 

I 


Departs  POI 

I 


Jl 


at  MTF 

I 

I 

I 

.1. 


t 


Departs  MTF 

I 

I 

I 

i 


to  Base 

I 

I 

I 


Casualty  Event  Service  Time 


MEDEVAC  Busy  Time 


Figure  1.  MEDEVAC  Dispatch  Timeline 


We  consider  three  casualty  event  categories:  urgent,  priority,  and  routine.  In  the 
model,  casualties  are  generated  from  clusters  using  a  Poisson  Hawkes  process  with  rate 
A.  Response  times  are  independent  for  each  casualty  event  and  priority  classihcation. 
A  monotonically  decreasing  function  based  on  service  time  and  casualty  category  is 
used  as  the  reward  function.  We  incorporate  queuing  in  the  model  to  allow  multiple 
casualty  events  to  occur  and  the  decision  to  wait  before  launching  a  MEDEVAC.  The 
decision  to  wait  can  be  advantageous  if  a  low  priority  casualty  event  has  occurred 
while  other  MEDEVAC  units  are  busy;  it  may  be  better  to  remain  on  standby  until 
other  MEDEVAC  units  become  available  in  case  a  high  priority  casualty  event  occurs. 
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All  casualties  are  evacuated  to  the  nearest  MTF.  After  a  group  of  casualties  is  dropped 
off,  MEDEVAC  units  return  to  their  originating  base. 

To  model  the  system,  we  base  the  framework  on  the  work  of  Maxwell  et  al.  (2010). 
We  establish  the  state  space  for  the  MDP  model  as  a  tuple  with  a  time  state,  an 
event  state,  a  MEDEVAC  status  vector,  and  a  queue  status  vector.  We  represent 
the  system  with  the  tuple  s  =  {T,e,  M,Q),  wherein  r  corresponds  to  the  current 
system  time  and  e  corresponds  to  the  current  event.  The  main  system  components 
are  M  =  (mi,  m2, ...,  ma)  and  Q  =  {qi,q2,  where  m*  contains  information 

about  the  MEDEVAC,  a  represents  the  maximum  number  of  MEDEVACs  in  the 
system,  qj  contains  information  about  the  jth  casualty  event  in  the  queue,  and  b 
represents  the  maximum  number  of  casualty  events  allowed  in  the  queue.  The  state 
of  MEDEVAC  i  is  given  as  a  tuple  m*  =  wherein  cij  is  the  status  of  the 

MEDEVAC,  di  is  the  expected  time  to  complete  the  current  movement,  and  f*  is  the 
starting  time  of  any  MEDEVAC  movement.  Once  a  MEDEVAC  is  launched  to  service 
a  casualty  event,  the  queue  status  is  updated.  After  a  MEDEVAC  drops  off  a  group  of 
casualties,  it  returns  to  base  where  it  then  becomes  available  to  launch.  The  status  of 
the  MEDEVAC,  Uj,  can  be  “idle”,  “enroute  to  a  casualty  event”,  “at  POI”,  “enroute 
to  MTF”,  or  “returning  to  base”.  If  the  MEDEVAC  is  not  idle,  ti  corresponds  to  the 
starting  time  of  the  movement;  otherwise,  ti  represents  the  time  of  the  current  event 
cycle.  The  state  of  casualty  event  j  in  the  queue  is  qj  =  {Sj,  Ij,  Q,  rjj),  where  6j  is  the 
status  of  the  casualty  event  in  the  position,  Ij  is  the  location  of  the  casualty  event, 
(j  is  the  time  the  casualty  event  arrived  in  the  system,  and  r]j  is  the  priority  of  the 
casualty  event. 

Events  are  triggered  by  changes  in  the  status  of  a  MEDEVAC  or  an  arrival  of  a  call. 
The  event  list  is  given  in  Table  1.  The  model  assumes  MEDEVAC  dispatch  decisions 
only  occnr  when  an  a  event  occnrs.  Although  it  is  possible  to  reroute  a  MEDEVAC 
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Table  1.  Event  List 


e,  event  list: 

Call  arrives  and  is  placed  in  the  jth  position 
MEDEVAC  i  departs  for  call  j  at  casualty  event 
MEDEVAC  i  arrives  at  caualty  event  for  call  j 
MEDEVAC  i  leaves  call  j  casualty  event  for  MTF 
MEDEVAC  i  delivers  call  j  at  MTF 
MEDEVAC  i  departs  MTF  to  return  to  base 
MEDEVAC  i  arrives  at  base 


mid-flight,  delay  and  confusion  in  communication  can  cause  large  problems  and  this 
practice  is  not  typical  in  combat  operations. 

Let  ^(s)  =  {i  :  ai  =  'Hdle”}  denote  the  set  of  MEDEVACs  available  for  dispatch¬ 
ing  when  the  system  is  in  state  s. 

Let  ^{s)  =  {j  :  6j  =  'Hdle”}  denote  the  set  of  casualty  events  awaiting  service  by 
MEDEVAC  when  the  system  is  in  state  s. 

To  capture  dispatching  decisions  we  let  Xij{s)=l  if  MEDEVAC  i  is  deployed  to 
casualty  event  j  when  the  system  is  in  state  s,  and  0  otherwise.  The  set  of  feasible 
decisions  can  be  written  as: 

^(s)  =  I  a;(s)  G  {0,  i}KWIx|^WI  -  ^  <  1  i  .  (1) 

I  J 

The  trajectory  of  the  system  is  denoted  in  the  form  {{sk,  Xk)  :  /c  =  1,  2, ...}  where 
Sk  is  the  state  of  the  system,  and  x^  is  the  decision  after  the  kth  event  has  occurred. 

To  capture  the  dynamics  of  the  system,  the  following  symbology  is  used:  = 

Xk,  uj{sk,  Xk)),  where  u{sk,  Xk)  is  a  random  element  of  an  appropriate  space  rep¬ 
resentation  of  the  stochastic  process  related  to  casualty  event  arrivals  and  delays,  and 
where  /  is  the  transfer  function. 
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The  reward  function  is  determined  by  the  value  of  being  in  the  pre-decision  state 
Sk-  The  reward,  r{sk,Xk,Sk+i)  is  shown  in  Equation  2. 

' 

Ai  ■  if  ?7j  =  '^urgent”  and  e{sk+i)  is  such 

that  casualty  event  j  was  dropped  off  at  a  MTF, 
f'{sk-,Xk,Sk+i)  =  <  A2  ■  0.99*^*''+!“^^^/^^°  if  rjj  =  "‘'priority”  and  e(sA:+i)  is  such 

that  casualty  event  j  was  dropped  off  at  a  MTF, 

0  otherwise 

(2) 

Tunable  parameters  Ai  and  A2  are  used  to  model  the  value  trade-off  between 
different  priority  casualty  events.  We  normalize  the  time  the  casualty  event  has  been 
in  the  system  based  on  requirements  outlined  in  Department  of  the  Army  (2007). 

The  reward  obtained  by  servicing  higher  priority  casualties  decays  quickly;  however, 
servicing  higher  priority  casualties  obtains  higher  utility  and  they  are  therefore  often 
served  Erst.  For  lower  priority  casualties,  the  decay  allows  enough  time  to  wait  before 
being  forced  to  launch  immediately,  which  increases  the  flexibility  of  the  decision 
maker. 

A  policy  7r(s)  G  (s)  maps  the  state  space  to  the  action  space.  In  this  form,  7r(s) 
is  the  action  taken  when  the  system  is  in  state  s.  Following  policy  tt  the  state  trajec¬ 
tory  of  the  system  :  k  =  1,  2, ...}  evolves  according  to  =  f{s1,n{s1),u{s'l,n{s1))). 
The  objective  function  is  given  by  Equation  3, 

r{s)  =  E 

where  7  G  [0, 1)  is  the  discount  factor  and  t{s1)  is  the  time  at  which  the  system  is  in 
state  si-  The  optimal  policy  tt*  maximizes  the  expected  total  discounted  reward  and 
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satisfies  the  following  optimality  Equation  4. 


J{s)  =  max  {E[r{s,x,  f{s,x,u{s,x)))  + 


(4) 


The  number  of  possible  values  created  by  this  equation  is  uncountable.  Even  if 
the  state  space  was  discrete,  it  would  be  computationally  intractable.  In  the  next 
section  we  propose  an  ADP  approach  to  obtain  an  approximation  of  J(s).  Utilizing 
the  value  function  approximation  for  J(s),  high  quality  policies  are  constructed  and 
compared  to  a  simple  myopic  policy,  which  is  often  employed  in  practice. 


3.2  ADP  Formulation 

We  use  LSTD  combined  with  API  (as  discussed  by  Scott  et  al.  (2014))  to  ap¬ 
proximate  an  optimal  solution.  API  is  very  similar  to  policy  iteration,  which  is  used 
to  solve  classical  MDPs.  To  obtain  an  optimal  policy  tt*  we  need  to  solve  Equation 
4.  To  solve  the  problem,  we  construct  an  approximation  of  the  value  function.  We 
employ  a  modihed  version  of  Bellman’s  equation  that  uses  post-decision  state  vari¬ 
ables.  The  post-decision  state  s^  refers  to  the  state  of  the  system  after  being  in  state 
Sk  and  upon  taking  action  Xij.  The  post-decision  state  variable  provides  tremendous 
computational  advantages  as  its  use  eliminates  the  embedded  expectation  within  the 
Bellman  equation.  The  value  of  being  in  state  immediately  after  a  decision  is  made 
is  denoted  by  .P{s%).  The  relationship  between  J(s)  and  is  dehned  as: 


\  s%].  (5) 

Bellman’s  equation  in  the  post-decision  state  becomes: 
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=  max{E[r(sfc,a;)  +7J^(4))  |  4_i]}.  (6) 

X 

Despite  the  reduction  in  dimensionality  with  the  use  of  the  post-decision  state, 
Equation  6  is  still  computationally  intractable  for  the  model.  We  proceed  by  develop¬ 
ing  a  set  of  basis  functions  to  approximate  Equation  7  shows  the  formulation 

we  use  in  the  model. 


=  (7) 

where  is  a  column  vector  with  elements  and  6^  is  a  column  vector 

of  basis  function  weights.  By  substituting  the  value  function  approximation  Equation 
7  into  the  Bellman  equation  using  the  post-decision  state  variable  (Equation  6),  we 
obtain  the  following  expression: 

97(sLi)=EHst,A'’'(s,|9))+797K)  14-.].  (8) 

Since  the  above  equation  is  an  approximation  of  the  multidimensional  model,  a 
linear  model  may  not  provide  a  hxed  solution.  However,  we  are  still  able  to  use  this 
representation  as  the  foundation  of  the  Least  Squares  Aproximate  Policy  Iteration 
(LSAPI)  algorithm.  We  find  the  policy  decision,  a;^(sfc|6'),  by  solving  Equation  9 

X^{sk\0)  =  argmax[r(sfc,a;)  +  ^6"' (t){sl)].  (9) 

X 

The  API  algorithm  shown  in  Table  2  was  introduced  by  Bradtke  &  Barto  (1996) 
and  modihed  by  Ma  &  Powell  (2010).  Starting  with  an  initial  9  for  the  base  policy, 
we  then  step  into  the  policy  improvement  loop.  To  evaluate  the  performance  of 
the  policy,  the  post-decision  state  is  randomly  sampled  and  the  value  is 
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recorded.  Next,  we  simulate  one  event  forward  and  determine  the  optimal  decision 
based  on  Equation  9,  recording  the  associated  reward,  r{Sk,h),  and  basis  function 
values  of  the  post-decision  state,  (f){S^f^).  After  completing  the  sampling  of  the  post¬ 
decision  state  space,  we  evaluate  the  performance  of  the  current  policy.  We  also 
introduce  a  harmonic  step-size  rule,  as  indicated  in  Equation  (10),  to  smooth  6. 
Smoothing  is  required  because  we  are  sampling  the  state  space  to  approximate  the 
Bellman  Equation  4.  Were  Equation  4  computationally  tractable,  the  model  could 
be  solved  using  traditional  value  iteration,  and  a  smoothing  function  would  not  be 
required  (Powell  (2009)).  The  parameters  a,  and  are  all  tunable,  where  a  is 
a  step  size  parameter,  ^  is  the  number  of  policy  improvement  iterations  completed 
and  is  the  number  of  policy  evaluation  iterations  completed. 
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Table  2.  API  Algorithm 


Approximate  Policy  Iteration  Algorithm 

Step  1: 

Initialize  6 

Step  2: 

for  g=l  to  ^  (Policy  Improvement  Loop) 

Step  3: 

for  h=l  to  Jif  (Policy  Evaluation  Loop) 

Step  4: 

Simulate  a  random  post-decision  state,  S^_-^ 

Step  5: 

Record  0(RLi,/i)- 

Step  6: 

Simulate  the  state  transition  for  the  next  event  to  get  Sk^h- 

Step  7: 

Determine  the  decision,  Xij  =  X'^{Sk,h  0)  using  Equation  9. 

Step  8: 

Record  r{Sk,h)- 

Step  9: 

Record  4>{Sly^). 

Step  10 

:  End. 

Step  11 

:  Update  6  using  Equation  11  and  step  size  Equation  10. 

Step  12 

:  End. 

a 


a  +  g  —  1 


+  0-1- 


a 


a  +  g  —  1 


(10) 


To  evaluate  the  policy,  we  apply  least  squares  regression.  The  post-decision  states, 
/i)  and  are  regressed  against  the  reward,  r{Sk,h)-  We  hrst  establish 
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the  basis  function  matrices  and  reward  vector.  Let 


1 

r 

1 

-T-  def 

$fc-i  = 

-T-  def 

,  r{Sk)  = 

,  r{Sk)  = 

^i^k-l,PEV 

^i^k,PEV 

r{Sk,PE) 

r{Sk,i) 


r{,Sk,PE) 


where  matrices  ‘hfc-i  and  are  rows  of  basis  function  evaluations  of  the  sampled 
post-decision  states,  is  the  reward  vector  for  the  sampled  events,  and  is  the 
expected  reward.  The  difference  between  and  ffc  is  what  we  will  refer  to  as  the 
Bellman  error,  which  we  seek  to  minimization.  Lagoudakis  &  Parr  (2003)  show  that, 
by  minimizing  Equation  11,  an  improved  set  of  9  values  can  be  attained, 

min  ||rfc  -  ffclla  (11) 

To  capture  the  dynamics  of  this  model,  we  need  to  properly  represent  the  approx¬ 
imations  of  the  model  with  basis  functions.  Basis  functions  can  be  very  difficnlt  to 
develop  (Powell,  2012).  We  start  by  creating  an  indicator  variable  if  MEDEVAC  i  is 
available,  as  represented  by  Eqnation  12, 

{1  if  m*  =  “Idle” 

V*  =  l,2,...,a  (12) 

0  otherwise  . 

To  captnre  the  expected  time  until  a  MEDEVAC  becomes  available,  the  2nd 
basis  fnnction  is  dehned  in  Eqnation  13,  where  represents  the  expected  time  for 
MEDEVAC  i  to  return  to  base  after  event  k,  dropping  off  casnalties  at  a  MTF.  This 
expected  time  is  added  to  dj  -  r,  which  is  the  expected  time  for  the  MEDEVAC  to 
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complete  its  current  movement. 


02, i 


\di-T  + dll.  a  mi=  “g” 

<  ’  V  i  =  1,  2, a 

0  otherwise 


(13) 


The  next  basis  functions,  03,i,04,i,  and  05, capture  the  status  of  all  casualty 
events  currently  being  served.  Basis  function  03, ^  represents  the  expected  time  from 
r  until  MEDEVAC  i  arrives  at  the  nearest  MTF  with  its  assigned  casualty  event. 


Idi  —  T  if  rrij  =  “Serving  casualty  event”j 

Vi  =  1,2,...,  a  (14) 

0  otherwise 

Basis  function  04,^  captures  the  total  expected  time  (including  launch  delay)  that 
a  casualty  event  will  be  in  the  system  if  it  was  served  by  MEDEVAC  i. 


1(j  —ti  +  di  if  rrii  =  “Serving  casualty  event”  j 

V*  =  l,2,...,a  (15) 

0  otherwise 

Basis  function  05,*  captures  the  priority  of  the  casualty  being  served  by  MEDEVAC 

i. 

{Tij  if  rrii  =  “Serving  casualty  event”  j 

Vi  =  1,2,..., a  (16) 

0  otherwise 

The  hnal  basis  function  calculates  the  expected  time  in  system  for  casualty  event 
j  if  it  is  to  be  assigned  MEDEVAC  i. 


4^6, i,j  ^i,j  A  02, i  V  i 


1,2,  ...,(2,  j  1,2,...,&, 


(17) 
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where  d*j  represents  the  expected  service  time  from  MEDEVAC  base  i  to  serve  ca¬ 
sualty  event  j  and  land  at  the  nearest  MTE.  We  use  these  basis  functions  as  well  as 
their  interaction  terms  in  order  to  approximate  the  value  function. 


3.3  Simulation 


To  compare  the  performance  of  the  ADP  against  the  myopic  policy,  we  simulate 
multiple  trajectories  and  compare  the  performance  of  the  two  policies.  For  the  myopic 
policy,  MEDEVAC  requests  are  served  with  decreasing  order  of  priority  with  £rst-in- 
hrst-out  for  like  priorities.  The  flow  chart  for  the  model  is  shown  in  Figure  2. 


Figure  2.  Simulation  Flow  Chart 


The  simulation  is  initiated  by  randomly  generating  a  casualty  event.  A  Hawkes 
spatial  generation  process  is  used  for  casualty  event  generation.  The  Hawkes  process 
models  situations  where  subsequent  events  are  likely  to  occur  in  close  proximity  to  the 
hrst  event  Kroese  &  Botev  (2013).  This  distribution  occurs  according  to  a  Poisson 
distribution  whichwell  models  real-world  casualty  events.  In  Step  1,  the  queue  and 
system  status  are  updated.  In  Step  2,  if  termination  time  has  not  been  met,  the  loop 
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is  continued.  In  Step  3,  we  stochastically  determine  if  a  casualty  event  has  occurred 
(based  on  A)  or  if  a  MEDEVAC  event  has  occurred  based  on  travel  times  and  the 
stochastic  elements.  In  Step  4,  the  system,  total  discounted  reward,  and  queue  are 
updated,  the  desired  policy  (ADP  or  myopic)  is  executed  and  the  simulation  returns 
to  step  1.  Step  4  incorporates  the  following  logic  check.  If  all  MEDEVACs  are  idle  and 
there  is  one  or  more  casualty  event  waiting  in  the  queue,  the  simulation  is  terminated 
and  returns  a  result  of  ^^DidNotFinish”  ]  this  prevents  premature  convergence  to  a 
very  poor  policy. 
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IV.  Computational  Results 


In  this  chapter,  we  present  a  notional  scenario  to  which  we  apply  the  ADP  solu¬ 
tion  methods  proposed  in  the  previous  chapter,  and  computational  experiments  are 
conducted  using  a  scenario  to  obtain  insights  regarding  solution  quality  and  compu¬ 
tational  effort.  We  examine  different  features  of  the  ADP  algorithm  and  different 
features  of  the  MEDEVAC  dispatching  problem  to  gain  further  insights  regarding  the 
performance  of  the  proposed  solution  method.  For  the  computational  experiments,  we 
use  a  dual  Intel  Xeon  E5-2650v2  workstation  with  192  GB  of  RAM  and  MATLAB’s 
parallel  computing  toolbox. 

4.1  Notional  Scenario 

We  present  a  scenario  in  which  a  coalition  of  allied  countries  perform  peacekeeping 
operations  in  response  to  islamic  state  militants  in  northern  Syria.  The  locations  for 
MEDEVAC  bases  are  likely  key  military  tactical  sites.  Casualty  collection  centers 
are  selected  and  weighted  by  projected  enemy  locations.  Figure  3  shows  the  26 
casualty  cluster  centers,  five  MEDEVAC  locations,  and  two  MTFs.  Steady  state 
and  high  operations  tempo  are  assumed  as  is  a  baseline  casualty  event  arrival  rate  of 
A  =  A,  representing  an  average  casualty  event  arrival  rate  of  one  event  per  hour.  Any 
MEDEVAC  is  allowed  to  service  any  casualty  event  and  casualty  events  do  not  need 
to  be  served  as  soon  as  they  arrive.  Equal  proportions  of  urgent  and  priority  class 
casualty  event  arrivals  are  assumed.  Routine  events  are  not  considered  due  to  the 
high  operational  tempo,  which  is  likely  given  intense  combat  scenarios  where  these 
routine  events  would  be  serviced  by  CASEVAC  or  ground  MEDEVAC.  The  reward 
function  utilizes  weights  of  Ai=10  and  A2=2,  which  rewards  urgent  much  greater 
than  priority  casualty  events. 
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Figure  3.  Notional  Scenario  Disposition 

4.2  Experimental  Design 

For  the  experimental  design,  we  focus  on  the  problem  parameter  A  and  the  ADP 
algorithmic  parameters  and  Jf".  The  ADP  parameters  are  set  based  on  our  initial 
experiences  with  implementing  the  model.  An  aggressive  smoothing  function  is  used 
because,  without  it,  we  obtain  cyclical  results  from  the  policy  improvement  steps 
shown  in  Figure  4.  With  smoothing  we  rapidly  approach  the  highest  quality  solution. 
We  observe  that  any  computation  time  beyond  20  policy  iterations  obtains  little  policy 
improvement.  Similarly,  after  20,000  samples  for  we  observed  no  improvement  in 
performance.  To  compare  solution  quality  and  computation  time  we  examine  ^={5, 
10,  20}  and  .^={5,000,  10,000,  20,000}  at  parameter  levels  A  =^,  ^  and  The 
casualty  arrival  rates  were  chosen  based  on  high  operations  tempo  for  A  =  A  and  A, 
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and  the  ADP  no  longer  out  performs  the  myopic  policy  around  A  = 

A  3^  Factorial  design  is  used  to  examine  parameters  in  order  to  gain  the  most 
hdelity  for  the  full  design  region.  Table  3  shows  factor  levels. 

Table  3.  Experimental  Design 

\  (S 

^  5  5,000 

^  10  10,000 
20  20,000 

4.3  Experimental  Design  Results 

Table  4  reports  the  results  from  the  experimental  design.  The  best  performing 
features  are  highlighted  for  A  =  A  and  A.  At  A=yA  multiple  features  are  within  a 
95%  conhdence  interval.  The  general  trend  for  the  hrst  two  casualty  arrival  rates  is  10 
policy  improvement  loops  and  at  least  10,000  policy  evaluation  loops.  Computation 
time  scales  closely  with  the  product  ^  .  In  order  to  effectively  evaluate  a  policy, 

the  ADP  requires  a  minimum  of  10,000  policy  evaluation  loops.  The  ADP  converges 
quickly  to  the  highest  performing  policy  and  policy  improvement  loops  greater  than 
10  do  not  improve  the  ADP  performance.  As  A  increases,  the  ADP  increasingly 
outperforms  the  myopic  policy  for  all  ADP  algorithmic  changes. 
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Table  4.  Experimental  Design  Results 


Gutter 

Inner 

M’ 

1 

A 

Improvement 

Urgent  Wait 

Time  in  Sec. 

Priority  Wait 

Time  in  Sec. 

MEDEVAC 

Busy  % 

Computation 

Time  in  Sec. 

5 

5000 

30 

48.82  ±  0.92% 

182.5 

398.0 

90.7% 

18.8 

10 

5000 

30 

51.67  ±  0.96% 

186.8 

388.2 

90.6% 

37.3 

20 

5000 

30 

52.75  ±  0.92% 

190.8 

377.4 

90.5% 

74.7 

5 

10000 

30 

57.80  ±  0.93% 

187.9 

375.0 

90.4% 

37.5 

10 

10000 

30 

61.93  ±  0.86% 

191.7 

369.3 

88.7% 

74.8 

20 

10000 

30 

50.39  ±  0.94% 

186.7 

386.5 

90.6% 

149.8 

5 

20000 

30 

51.42  ±  0.89% 

177.8 

397.2 

90.6% 

74.7 

10 

20000 

30 

59.63  ±  0.91% 

190.5 

369.7 

89.6% 

150.2 

20 

20000 

30 

53.65  ±  0.90% 

192.0 

373.5 

90.6% 

300.1 

5 

5000 

60 

24.22  ±  0.99% 

170.0 

234.3 

84.1% 

19.1 

10 

5000 

60 

24.93  ±  1.07% 

170.4 

230.1 

83.0% 

37.1 

20 

5000 

60 

26.79  ±  1.04% 

177.1 

265.4 

80.1% 

74.7 

5 

10000 

60 

26.84  ±  1.05% 

166.7 

232.8 

83.2% 

37.8 

10 

10000 

60 

26.98  ±  1.09% 

168.2 

232.0 

82.4% 

74.9 

20 

10000 

60 

27.10  ±  1.00% 

167.9 

230.8 

82.0% 

150.5 

5 

20000 

60 

25.25  ±  0.99% 

167.0 

240.4 

84.2% 

74.9 

10 

20000 

60 

30.80  ±  1.07% 

154.0 

396.5 

81.0% 

149.9 

20 

20000 

60 

28.22  ±  1.01% 

165.6 

237.8 

81.4% 

300.1 

5 

5000 

120 

1.13  ±  1.43% 

115.2 

120.3 

34.8% 

19.1 

10 

5000 

120 

-5.25  ±  1.57% 

121.5 

132.3 

36.4% 

37.5 

20 

5000 

120 

-4.88  ±  1.55% 

117.9 

127.0 

33.8% 

75.4 

5 

10000 

120 

-0.60  ±  1.47% 

116.6 

121.7 

35.0% 

37.7 

10 

10000 

120 

-0.02  ±  1.53% 

113.9 

121.0 

32.3% 

74.6 

20 

10000 

120 

-3.05  ±  1.53% 

119.3 

145.4 

32.4% 

150.1 

5 

20000 

120 

0.65  ±  1.55% 

113.3 

117.5 

32.9% 

75.2 

10 

20000 

120 

-0.74  ±  1.50% 

117.2 

123.7 

35.4% 

149.6 

20 

20000 

120 

-2.25  ±  1.52% 

116.2 

129.2 

32.7% 

300.6 
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4.4  Baseline  Scenario  Resnlts  and  Analysis 


The  baseline  scenario  uses  the  parameters  in  Table  5.  We  use  the  best  quality 
ADP  settings  for  ^  and 

Table  5.  Baseline  Scenario  Parameters 


Problem  Features  ADP  Algorithm  Features 


Parameter 

Description 

Setting 

Parameter 

Description 

Setting 

A 

Casualty  Arrival  Rate 

60 

0 

Basis  Function 

3rd  Order 

Ai 

Weight  for  urgent  event 

10 

Policy  Improvement  Loop 

10 

A2 

Weight  for  priority  event 

2 

Jf 

Policy  Evaluation  Loop 

20000 

Table  6  shows  the  performance  of  the  ADP  against  the  myopic  policy.  Based  on  a 
95%  conhdence  interval,  the  third  order  ADP  outperforms  the  myopic  policy  by  30%. 
The  average  casualty  event  service  times  for  the  two  casualty  classihcation  levels,  and 
the  average  proportion  of  time  that  the  MEDEVACs  are  busy  (which  includes  time 
spent  traveling  back  to  base  even  if  they  are  not  actively  serving  a  casualty  event)  also 
outperform  the  myopic  policy.  The  ADP  policy  focuses  on  servicing  urgent  casualty 
events  hrst,  as  noted  by  shorter  wait  times;  moreover,  MEDEVACs  are  being  utilized 
more  efficiently  as  shown  by  the  lower  average  busy  percentage.  Utilizing  a  set  of 
third  order  basis  functions,  achieves  the  best  ADP  performance.  Use  of  a  set  of 
fourth  order  basis  functions  caused  problems  with  computational  singularity  while 
calculating  the  LSTD  regression  Equation  11,  so  we  did  not  experiment  with  higher 
order  basis  functions.  There  are  no  statistical  differences  in  computational  times 
between  different  orders  of  0.  We  use  0,0^,  and  0^  for  the  rest  of  the  experiments. 
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Table  6.  ADP  Baseline  Performance 


Policy 

Number  of 

Basis  Functions 

Improvement 
Over  Myopic 

urgent  service 
time  (min.) 

priority  service 
time  (min.) 

MEDEVAC 

busy 

Myopic 

- 

- 

285.4 

286.7 

89.3% 

ADP,  1st  Order 

160 

17.7  ±  1.4% 

167.0 

299.3 

87.8% 

ADP,  2nd  Order 

320 

26.0  ±  1.5% 

165.5 

248.9 

82.3% 

ADP,  3rd  Order 

480 

30.8  ±  1.4% 

154.0 

235.4 

81.0% 

In  Figure  4  we  observe  diminishing  returns  for  policy  improvement.  Moreover, 
without  the  smoothing  function,  we  fail  to  obtain  a  higher  quality  solution.  We  note 
this  behavior  across  all  experimental  levels. 


Figure  4.  Smoothing  vs  Non-Smoothing  ADP  Performance 


Examination  of  the  basis  functions  from  the  best  performing  third  order  ADP 
results  provides  the  following  insight.  The  basis  function  05,  which  captures  the 
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interaction  between  MEDEVAC  i  and  the  priority  of  the  casualty  event  it  is  serving, 
has  the  largest  impact  on  the  policy.  The  number  and  proximity  of  casualty  clusters  to 
a  MEDEVAC  base  show  an  increase  in  magnitude  for  their  respective  basis  function 
coefficients  {6).  Moreover,  MEDEVACs  co-located  with  the  MTF  show  a  similar 
increase  in  magnitude,  despite  being  further  away  from  large  groupings  of  casualty 
clusters.  Interestingly,  (pi,  shows  low  statistical  significance  during  regression  and 
also  has  small  magnitudes  for  its  coefficients  for  all  MEDEVACs.  This  is  because  the 
ADP  policy  does  not  seek  to  penalize  MEDEVACs  for  being  idle,  which  could  force 
them  to  otherwise  launch  on  low  priority  or  far  away  casualty  events. 

Examination  of  the  best  ADP  policy  indicates  that  it  is  best  to  launch  MEDE¬ 
VACs  which  are  close  to  casualty  events  and  co-located  with  hospitals  only  on  high 
priority  casualty  events.  It  is  difficult  to  determine  the  dynamics  of  when,  and  for 
how  long,  to  hold  a  MEDEVAC  in  reserve  before  launching  because  of  the  sheer 
dimensionality  of  the  system. 

We  next  examine  the  performance  of  the  ADP  against  different  casualty  rates  by 
adjusting  A.  Figure  5  shows  the  performance  of  the  ADP  against  the  myopic  policy. 
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Figure  5.  Percent  Improvement  Over  Myopic 


As  A  decreases,  the  frequency  with  which  casualties  arrive  decreases  and  the  ADP 
policies  no  longer  outperform  the  myopic  policy.  As  casualty  events  arrive  at  a  slower 
rate,  the  utility  of  holding  MEDEVAC  aircraft  in  reserve  is  diminished.  At  a  casualty 
event  arrival  rate  of  one  every  two  hours  (i.e.  A  =  ^  we  observe  a  reduction  in 
performance  of  the  ADP  compared  to  the  myopic  policy,  showing  the  limitation  of 
the  set  of  basis  functions  and  their  resulting  value  function  approximations.  Keneally 
et  al.  (2014)  show  that  as  A  goes  to  extremes,  the  difference  in  performance  between 
optimal  and  myopic  policies  becomes  negligible. 

The  impact  of  changing  the  proportion  of  casualties  from  all  urgent  to  all  priority 
casualty  events  is  shown  in  Figure  6.  We  observe  a  noted  improvement  over  the 
myopic  policy.  This  indicates  the  ADP  policy  is  efficiently  managing  resources  by 


not  immediately  sending  the  nearest  available  MEDEVAC,  but  rather  waiting  and/or 
sending  a  MEDEVAC  that  is  further  away  but  is  also  not  close  to  a  high  rate  casualty 
cluster  area. 


Figure  6.  Performance  of  Changing  Proportion 


We  also  examine  the  impact  of  using  a  different  reward  function.  Instead  of  a 
simple  exponential  decay,  a  monotonically  decreasing  arctan  function  is  utilized.  The 
function  slowly  decays  before  reaching  a  threshold  in  proximity  of  the  requirements 
outlined  by  Department  of  the  Army  (2007).  Figure  7  shows  the  comparison  of  the 
exponential  decay  versus  the  slower  decaying  arctan  function. 
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Figure  7.  Reward  Functions 

Table  7  shows  the  performance  of  the  ADP  policy  versus  the  myopic  policy  with 
both  reward  functions  in  the  baseline  scenario.  We  observe  nearly  identical  perfor¬ 
mance  between  the  two  reward  functions  with  overlapping  95%  conhdence  intervals. 
These  results  indicate  the  ADP  policy  is  robust  to  changes  in  the  reward  function. 


Table  7.  Reward  Function  Comparison 


Reward  Function 

Performance 

Exponential 

Arctan 

30.8%  ±  1.40% 
31.0%  ±  1.48% 

Lastly,  there  are  experimental  rotary  wing  aircraft  which  could  potentially  be  put 
into  service  and  which  can  travel  signihcantly  faster  than  the  UH-60  A/L  Blackhawk. 
To  examine  the  impact  of  these  aircraft,  the  maximum  speed  with  which  the  MEDE- 
VAC  aircraft  can  travel  is  adjusted,  and  stochastic  parameters  remain  constant.  Table 
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8  shows  the  percent  increase  in  average  airspeed  over  the  UH-60  A/L. 


Table  8.  Performance  vs  increasing  Airspeed 

Airspeed  Increase  vs  UH-60 


Baseline 

25% 

50% 

75% 

Myopic  Improvement  (Against  Baseline  Myopic) 

- 

50.05% 

89.45% 

126.12% 

Urgent  Service  Time  (min.) 

286.1 

168.5 

104.8 

77.2 

Priority  Service  Time  (min.) 

284.4 

166.8 

104.1 

76.8 

MEDEVAC  Busy  % 

89.3% 

78.6% 

61.6% 

47.8% 

ADP  Improvement  (Against  Baseline  Myopic) 

30.78% 

68.32% 

102.81% 

127.12% 

ADP  Improvement  (Against  Like  Myopic) 

30.78% 

12.17% 

7.05% 

0.44% 

Urgent  Service  Time  (min.) 

154.0 

116.8 

90.8 

74.8 

Priority  Service  Time  (min.) 

396.5 

140.3 

103.4 

81.5 

MEDEVAC  Busy  % 

70.2% 

69.6% 

56.6% 

46.9% 

It  is  reasonable  to  assume  newer  rotary  wing  designs  can  increase  airpseed  25%- 
50%  compared  the  the  UH-60  A/L.  This  increase  in  speed  has  a  signihcant  impact 
in  the  overall  performance  for  both  ADP  and  myopic  policies.  We  see  diminishing 
returns  for  the  ADP  compared  to  the  myopic  policy  as  the  airspeed  is  increased. 
Despite  this,  decision  makers  can  still  beneht  from  high  quality  dispatching  solutions. 

When  we  examine  computational  effort  more  closely,  we  hnd  the  ADP  solution 
only  requires  about  150  seconds  for  the  baseline  scenario.  The  largest  computational 
effort  came  from  running  the  simulation.  For  each  event  in  the  simulation,  the  best 
action  must  be  determined,  as  indicated  by  Equation  9.  In  addition  to  these  calcu¬ 
lations,  we  require  500  runs  in  order  to  achieve  our  desired  conhdence  interval.  The 
computation  times  for  the  simulation  using  the  myopic  and  ADP  policies  are  shown 
in  Table  9. 
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Table  9.  Simulation  run  times 


Policy 

Computation  Time  (sec.) 

Runs 

Myopic 

34.625 

500 

ADP 

884.93 

500 

The  computation  time  of  calculating  Equation  9  is  about  25  times  that  of  the  hard¬ 
coded  myopic  policy.  Despite  this  burden,  it  is  still  possible  to  compute  the  ADP 
policy  and  run  the  simulation  in  under  20  minutes.  These  results  are  promising,  as 
the  model  could  readily  be  adapted  and  applied  to  current  operations  to  yield  timely 
results. 
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V.  Conclusions 


This  thesis  examines  the  MEDEVAC  dispatching  problem.  The  intent  of  the  re¬ 
search  is  to  determine  policies  that  increase  the  survivability  of  battleheld  casualties. 
Development  of  an  MDP  model  of  the  MEDEVAC  dispatching  problem  enables  ex¬ 
amination  of  many  different  scenarios.  Solving  the  MDP  requires  the  use  of  ADP. 
By  using  the  post-decision  state  and  approximating  the  value  of  being  that  state, 
our  model  became  computationally  tractable.  To  examine  the  performance  of  poli¬ 
cies  produced  by  our  model,  we  created  a  scenario  and  simulated  the  outcome  of  the 
established  policies. 

The  ADP  policy  was  able  to  increase  overall  utility  by  30%  compared  to  the  myopic 
policy  in  our  baseline  scenario.  Additionally,  MEDEVAC  busy  time  was  decreased 
by  9%,  indicating  more  efficient  use  of  MEDEVAC  aircraft.  The  basis  function  coef- 
hcients  revealed  MEDEVAC  aircraft  in  close  proximity  to  higher  probability  casualty 
clusters  were  more  valuable  than  aircraft  based  further  away.  This  is  an  intuitive  re¬ 
sult.  These  higher  value  MEDEVACs  should  likely  not  be  dispatched  for  low  priority 
casualty  events  while  the  lower  value  MEDEVACs  may  be  dispatched  instead.  The 
ADP  policy  was  able  capture  the  overall  time  it  would  take  for  any  MEDEVAC  to 
service  any  casualty  event.  This  is  important,  as  a  MEDEVAC  may  possibly  become 
available  which  could  service  a  casualty  event  faster  than  an  idle  MEDEVAC  that 
is  further  away.  Maximum  speed  of  the  aircraft  has  the  largest  impact  on  perfor¬ 
mance.  Results  indicate  a  25%  increase  in  speed  increases  utility  by  50%.  Even  with 
the  performance  increase  from  speed,  the  ADP  policy  still  provides  increased  utility 
compared  to  the  myopic  policy. 

This  model  and  its  results  are  benehcial  to  military  planners  and  decisions  mak¬ 
ers.  Military  planners  can  use  this  model  to  compare  policies  as  well  as  evaluate 
different  potential  MEDEVAC  station  locations  in  order  to  maximize  performance. 
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Decision  makers  can  use  current  military  intelligence  and  operational  experience  to 
identify  areas  in  which  casualties  are  likely  to  occur.  Once  these  areas  are  identihed, 
decisions  makers  can  make  informed  decisions  about  the  value  of  each  MEDEVAC 
and  maximize  the  utilization  of  their  resources. 

Results  also  indicate  the  criticality  of  MEDEVAC  travel  speed.  Military  planners 
and  acquisitions  (those  responsible  for  implementing  new  technology  into  the  military) 
can  use  this  model  to  examine  the  impact  of  capacity  compared  to  speed.  This 
information  can  be  used  for  future  design  and  development  for  a  replacement  to  the 
UH-60  A/L.  Perhaps  a  mix  of  large  capacity  UH-60  A/Ls  and  a  new-lower  capacity, 
high-speed  design  would  improve  overall  casualty  survivability. 

The  model  does  not  take  into  account  dynamic  repositioning  or  dispatching.  Air¬ 
craft  are  required  to  return  to  base  before  they  become  available.  In  many  situations, 
decision  makers  will  immediately  dispatch  a  MEDEVAC  who  dropped  off  a  casualty 
event  at  an  MTF,  but  has  bit  yet  returned  to  their  original  base,  to  a  new  casualty 
event.  This  possibility  was  not  considered  in  this  thesis  as  crew  limitations  do  not 
always  allow  this  decision  to  be  feasible. 

Dynamic  in-flight  rerouting  is  not  considered.  If  a  MEDEVAC  has  capacity  to 
take  on  additional  casualties  and  a  casualty  event  occurs  in  close  proximity,  it  may  be 
worth  sacrihcing  time  for  the  casualties  on-board  in  order  to  reduce  the  service  time 
for  the  new  casualty  event.  Communication  limitations  as  well  as  uncertainty  about 
specihc  casualty  events  (e.g,  the  actual  condition  of  the  casualty  event  onboard  versus 
the  survivability  function)  make  this  a  complex  decision.  However,  for  low  priority 
casualty  events,  dynamic  rerouting  would  likely  have  significantly  less  negative  impact 
on  those  events  and  have  a  large  positive  impact  on  the  casualty  event  which  would 
be  served  faster. 

Future  extensions  to  this  model  could  include  dynamic  routing  and  rerouting. 
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comparing  different  aircraft  types,  and  comparing  MEDEVAC  placement  (i.g,  mul¬ 
tiple  aircraft  at  a  base  and  some  bases  empty).  Eor  a  long  range  planning  tool, 
researchers  should  focus  on  the  impact  of  MEDEVAC  placement  and  aircraft  designs. 
These  results  could  provide  valuable  insight  for  military  planners. 

Implementing  the  ADP  policy  into  active  operations  is  an  incredibly  difficult 
proposition.  The  myopic  policy  is  often  used  because  it  is  simple  to  implement  and 
performs  fairly  well  as  long  as  casualty  events  arrive  at  a  frequency  less  than  two 
hours.  The  important  point  for  decisions  makers  to  garner  is  understanding  the  value 
of  specihc  MEDEVACs  and  how  to  utilize  them  efficiently. 
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VI.  Appendix 


6.1  Appendix  A 


Arrival  rate  of  casualty  events 

s 

The  current  state  of  the  system 

T 

Current  system  time 

e 

Current  event 

a 

Number  of  MEDEVAC  Units 

rui 

Information  about  the  Ah  MEDEVAC 

b 

Size  of  the  casualty  event  queue 

(Ij 

Information  about  the  casualty  event  in  the  jth  position  of  the  queue 

CTi 

Status  of  the  MEDEVAC 

di 

Expected  time  to  complete  current  movement 

U 

Start  time  of  current  MEDEVAC  movement 

Status  of  the  casualty  event 

h 

Location  of  the  casaulty  event 

Time  the  casualty  event  arrived  in  the  system 

Vj 

Priority  of  the  casualty  event 

Xk 

Decision  after  the  kih  event 

u 

Stochastic  process  for  state  transition 

A 

Utility  multiplier  for  casualty  event  priorities 

a 

Harmonic  step  size  parameter 

^i,k 

Expected  time  for  MEDEVAC  i  to  return  to  base  after  event  k 

dh 

Expected  time  from  base  i  to  pickup  and  drop  off  casualty  event  j 

The  number  of  Policy  Improvement  Loops 

The  number  of  states  sampled  per  Policy  Evaluation  Loop 

The  set  of  idle  MEDEVAC  aircraft 

The  set  of  idle  casualty  events 

The  set  of  feasible  actions 

Table  10.  Parameters 


36 


An  Approximate  Dynamic  Programming  Model  for 
MEDEVAC  Dispatching 


6.2  Appendix  B 
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